: from Corpus Compilation to Bilingual Terminologies for MT and CAT Tools
نویسندگان
چکیده
This paper describes the TTC Web platform, an online demonstrator to show the whole pipeline to compile bilingual terminologies out of comparable corpora gathered from the web using the tools developed in the TTC project Terminology Extraction, Translation Tools and Comparable Corpora. We present the whole chain which has been integrated into the platform, as well as their main components: a focused web crawler; a UIMA based tool for both monolingual term extraction and bilingual term alignment, tools for monolingual term extraction using both rule-based and probabilistic methods, and finally, an online terminology platform to edit the output of the TTC tools. The TTC tool chain is available for all the languages of the project: DE, EN, ES, FR, LV, RU and ZH. With respect to the potential users of the tools, in the first Tralogy conference we presented the different users and scenarios that were envisaged: from basic users to professionals of the MT industry. In this paper we will include the first feedback obtained from users during the second user workshop that was organized to demonstrate and test the tools with potential users and experts of the MT, CAT, and terminology management domain.
منابع مشابه
Compiling French-Japanese Terminologies from the Web
We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compilation exploit parallel texts, while the more recent ones have focused on comparable corpora. We use bilingual corpora collected from the web and tailor made for the seed terms. For each language, we extract from the c...
متن کاملBuilding a Spanish-German Dictionary for Hybrid MT
This paper describes the development of the Spanish-German dictionary used in our hybrid MT system. The compilation process relies entirely on open source tools and freely available language resources. Our bilingual dictionary of around 33,700 entries may thus be used, distributed and further enhanced as convenient.
متن کاملTerminology-driven Augmentation of Bilingual Terminologies
This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...
متن کاملEvaluation of terminologies acquired from comparable corpora: an application perspective
This paper describes a protocol for the evaluation of bilingual terminologies acquired from comparable corpora. The aim of the protocol is to assess the terminologies’added-value in a task of specialized translation. The protocol consists in having specialized texts translated in various situations: without any specialized resource, with an domain-related bilingual terminology or using Internet...
متن کامل